Submission to metric track

Introduction

Deciding between man and zone coverage is one of the most critical strategic decisions a defensive coordinator must take before each offensive play in American football. While experienced offensive coordinators and quarterbacks often rely on visual cues to identify these defensive schemes, the increasing availability of player tracking data offers a new avenue to uncover and analyze these tactics. A notable example is Amazon’s NFL Next Gen Stats model, which delivers coverage predictions during live broadcasts (see a snapshot of the 2024 Week 12 match between the Pittsburgh Steelers and Cleveland Browns). However, in these pre-snap motion does not seem to play an accentuated role (see Amazon), although it is a crucial element of modern offensive strategies and it is often used to detect the hidden defensive scheme.

Hence, our contribution explores the potential of the information in pre-snap motion, previously omitted. While we similarly predict man- or zone coverage when the teams are set before snaps, we further leverage the additional information of pre-snap player movements. Specifically, in addition to including rather naive post-motion features, we use a hidden Markov model (HMM) to model defenders’ trajectories based on hidden states, which represent the offensive players they may be guarding. Incorporating summary statistics of the state decoding results as features into the existing models substantially improves the predictive ability. This lays the groundwork for further analyses such as the evaluation of the effectiveness of pre-snap motion in uncovering defensive strategies.

Coverage Prediction

Data

We aim to forecast the defensive scheme (man- or zone defense) using the pff_passCoverage indicator in play-by play data. We omit plays tagged as others as well as plays with more than five offensive linemen and with two quarterbacks. Since we are specifically interested in analyzing pre-snap player movements, we concentrate on plays that contain any pre-snap motion. Ultimately, we end up with \(3985\) offensive plays in total, from which the defense played \(2973\) in zone and \(1012\) in man coverage.

Feature engineering

To accurately forecast the defensive scheme (man- or zone defense) for every play, we create various features derived from the tracking data. In particular, we conducted the following feature engineering steps: First, using all 11 players on each side, we compute the area spanned by the convex hull of a team as well as the largest \(y\) distance (i.e. the width of the hull) and the largest \(x\) distance (i.e. the length of the hull). Then, we select the five most relevant players on each side of the field. For offense, we omit the offensive line and the QB, thus coming up with five players that are a composition of running backs, tight ends and wide receivers. In contrast, for defense, disregarding defense liners (NT, DT, DE), we select the five defenders that were the closest to the five offensive players, corresponding to a weighted euclidean distance, putting much more emphasis on the y-axis. From these 10 players, we derive features related to their (standardized) position, distance and orientation. Additionally, we extract relevant information from the play-by-play data, such as quarter, down, yards to go, home and away score and the remaining game time in the current half (in seconds). See the Appendix for a more detailed description of the features.

Analysis

We train different models to predict whether the defense plays a man- or zone coverage scheme. Since the aim of the project is to show the effectiveness of pre-snap motion, we follow a three-step approach:

  1. Pre-motion models
  2. Naive post-motion models
  3. HMM post-motion models

In general, we have a limited dataset available (only 3985 plays) and therefore need to manage model complexity by controlling the number of features. Given the small dataset, we focus on 32 previously described basic features used in all 3 models: 6 convex hull related features, 20 player features, and 6 play-by-play features. In the Appendix, we provide a discussion on the choice of features.

1. Pre-motion models

First of all, we need a suitable basic model class for predicting man or zone coverage and opt for the following two: First, we fit a glmnet (elastic net) model, which performs implicit feature selection and is able to handle multicollinearity. Second, we use an xgboost model, which is able to capture non-linear effects (and interactions) and also handles multicollinearity. However, it necessitates careful hyperparameter tuning and generally performs better on bigger data sets. For all of the models, we use 10-fold cross validation on a suitable hyperparameter grid. Ich finde der Teil ist besser in der Einleitung zur Analyse, da wir die zwei Modellklassen ja für alle 3 Modelle verwenden. Daher auch nochmal die Beschreibung der Features oben, weil die 32 features sind einfach für alle Modell gleich.

We fit the models with the previously described basic features. All of these features are derived at the time of line-set, which is why these models do not use any pre-snap motion information. These very basic models serve as baseline models that allows to measure the effect of pre-snap motion (features) in the following.

2. Naive post-motion models

In a second step, we extend our basic pre-motion model with naive post-motion features. To keep the complexity manageable, we derive only 6 additional post-motion features: for each team (offense and defense), we infer the maximum \(y\)-distance, the maximum \(x\)-distance and the total distance traveled by both teams until the snap.

3. HMM post-motion models

3.1 Hidden Markov model

The previously described models primarily serve as baselines for evaluating the impact of incorporating pre-snap motion information into predicting defensive schemes. The main contribution of our project lies in the effective integration of this information using a hidden Markov model (HMM).

Specifically, we use results derived from an HMM fitted to the pre-snap motion data as further features in the previously described models. To achieve this, we model the movements of the five defensive players during pre-snap motion by an HMM (see the Appendix for an in-depth description). In particular, we assume that each defender’s \(y\)-coordinate at each time point \(t\) is a realization from a Gaussian distribution with mean aligned to the \(y\)-coordinate of the offensive player they are guarding and an estimated standard deviation. HMMs are particularly well-suited for this task since the guarding assignments are not directly observed but need to be inferred from the defenders’ responses to the offenders’ movements. Hence, the model naturally treats this information as a latent state variable, making each observation a realization from a mixture of Gaussian distributions (see Franks et al. 2015 for a similar approach in basketball). The ultimate goal is to use the fitted model for state-decoding, i.e., inferring information on the guarding assignment based on the observations made. HMMs excel in this context as they leverage not only the defenders’ \(y\)-coordiate as well as the \(y\)-coordiate of all offenders potentially guarded at the current time point, but also incorporate probabilistic information based on the previous and subsequent observations. State-decoding can either provide us with fully probabilistic information, i.e., the probability of each defender guarding each offensive player at each time point, or a single most likely guarding assignment for each defender at each time point. We will use the both to enhance our post-motion models and use the latter for visualization purposes. For details on model fitting and state decoding, also see the Appendix.

The results of the HMM are exemplified using the following video and animation. They display a touchdown of the Kansas City Chiefs against the Arizona Cardinals in Week 1 of the 2022 NFL season. We can see that, pre-snap, Mecole Hardman (KC #17) is in motion. He is immediately followed by the defender Marco Wilson (AZ #20), which is a clear indication for man-coverage.

After state decoding using the fitted HMM, we can visualize the inferred guarding assignment as depicted below. It becomes evident that the HMM effectively captures Marco Wilson’s guarding assignment. The unique strength of HMMs is apparent when Marco Wilson reaches the same \(y\)-coordinate as the running back, Jerick McKinnon (KC #1). At this point, clustering algorithms that disregard the temporal component would briefly suggest a change in the guarding assignment, introducing noise into summary statistics such as the total number of switches predicted by the model. In contrast, the HMM consistently and accurately maintains the correct coverage assignment throughout the motion as a consequence of its temporal persistence.

Coverage Prediction according to a hidden Markov model.

3.2 Enhanced post-motion model

To incorporate pre-snap information, we re-train the post-motion model to predict whether the defense employs a man- or zone-coverage scheme, now integrating results from the HMM analysis as additional features. A caveat of the aforementioned decoded state probabilities is that they form a multi-dimensional time series, complicating their direct use as features in the previously described models. This challenge can however be remedied by employing suitable summary statistics. To achieve this, we first calculate for each defender the most likely offensive player \(k = 1, \ldots, 5\) to be guarded in each time point \(t = 1, \ldots, T_n\), where \(T_n\) is the length of the \(n\)-th play, and count the number of state switches, i.e. the number of times for which a defender’s most likely guarded offensive players differs between two consecutive time points \(t\) and \(t+1\). From this, we can calculate some simple summary statistics, specifically 1) the sum of state switches, 2) the average number of state switches, and 3) the number of defenders that switch offensive players during a play. Additionally, we use the aforementioned decoded state probabilities to calculate a more elaborated statistic, the mean entropy across defenders \(j = 1, \ldots, 5\) in each offensive play \(n = 1,\ldots, N\):

\[H(n) = - \frac{1}{5}\sum_{j = 1}^5 \sum_{k=1}^{5} \left( \frac{1}{T_n} \sum_{t=1}^{T_n} \mathbb{1}\left(\arg\max_{i=1,\ldots,5} X_{t,i} = k\right) \cdot \log\left(\frac{1}{T_n} \sum_{t=1}^{T_n} \mathbb{1}\left(\arg\max_{i=1,\ldots,5} X_{t,i} = k\right)\right) \right)\]

As the entropy is a measure of uncertainty or randomness in a probability distribution, higher entropy indicates greater unpredictability, while lower entropy signifies more predictability. In our setting, we suspect that higher entropy values are associated with less persistent guarding allocations, while lower entropy values are indicative of more stable guarding assignments.

Results

Model and feature evaluation

Den Text finde ich irgendwie unrund und nicht so leicht verständlich.

Vorschlag (Robert):

We evaluate the predictive performance of our three models to assess how well pre-snap player movements help identify the correct defensive scheme. To do this, we split the data into two sets: 85% for training and 15% for testing. During the training phase, we use cross-validation to fine-tune the models.

To ensure robust and reliable results, we repeat the evaluation process 50 times, using different cv splits for each iteration. For each split, we train and fine-tune the model on the training data and then evaluate its performance on the test set. This approach helps minimize the risk of our findings being influenced by randomness, which is particularly important given the limited size of our dataset.

Die plots sind noch sehr verpixelt

Robert: komisch, bei mir sind sie eig recht scharf… hab versucht es bisschen zu ändern. Wenns das nicht besser macht, weiß ich leider auch nich was hilft…

The plots above show the results from our 50 repetitions of the experiment. The first figure illustrates classification accuracy, using a threshold of 50%. Since this value is somewhat arbitrary and our dataset is slightly imbalanced (\(\approx\) 75% zone coverage plays and \(\approx\) 25% man coverage plays), accuracy is a suboptimal metric for evaluating the results. Hence, we also present the AUC values of the three models (see Appendix for a discussion on evaluation metrics). Several observations emerge from these metrics. On the one hand, the more flexible xgboost model outperforms glmnet, but it also exhibits greater variation, indicating that additional data could enhance the tuning of this model. On the other hand, the pre-motion model performs the worst. However, once motion information is incorporated, we observe a substantial improvement in performance. While this is to be expected, we also see that the HMM post-motion model outperforms the model with only naive post-motion information. This indicates that the HMM features capture distinct information beyond simple motion features, providing an additional dimension that enhances the overall analysis.

Team analyses

Die figure fehlt irgendwie noch

Main findings:

Plot: In general, motion improves the correct detection of coverage (median of most teams positive). Table: Giants (Titans) –> low number of motions but very effective 49 out of 60 (82 %) of the motion plays increased coverage detections. Miami, Atlanta, SF –> Extensive use of motion but still very effective. Raider, Seattle, Rams –> Substantial use of motion but very ineffective. Bills (Bengals) –> little use of motion and also ineffective. Rouven, wenn Du es schaffst einen netten text zu den Findings zu verfassen wäre es super!

Discussion

While we were clearly able to show that pre-snap motion facilitates the defensive coverage scheme detection, a limitation of our approach lies in the imperfect prediction accuracy of the pre-motion model, primarily due to insufficient hyperparameter tuning and the relatively small number of plays involving motion. Ich würde nicht mit limitations anfangen sondern mit dem was wir gezeigt haben. Außerdem sollte man das insufficient hyperparameter tuning weglassen und nur auf den kleinen Datensatz fokussieren. Wir tunen gut genug und es liegt auch am Datensatz, dass wir nicht mehr machen. However, the primary focus of this project was on pre-snap motion, particularly on how to effectively translate this information into the hidden Markov model. Importantly, our HMM approach is modular and can be seamlessly replaced by another model, such as the NFL Next Gen Stats model, in the case of a richer data set. Combining both of these worlds could fully leverage the insights provided by the present data.

Vorschlag Robert (mit Teilen von oben und chatGPT Hilfe von unten):

This project explored the use of hidden Markov models (HMMs) to enhance the prediction of defensive schemes in football by incorporating pre-snap player movements. Models augmented with HMM-derived features demonstrated improved predictive performance compared to those relying solely on naive motion data.

Comparing predictions from pre-motion and post-motion models provides insights into an offense’s ability to use motion effectively to identify defensive coverage schemes. Similarly, it allows for evaluating a defense’s skill in disguising their coverage and confusing the offense during pre-snap motion.

While our pre- and post-motion models are limited due to small sample size available, our HMM approach is modular and therefore our models can be replaced by more another model in the case of a richer data set. Combining both of these worlds could fully leverage the insights provided by the present data.

Von chatGPT:

This project explored the use of hidden Markov models (HMMs) to enhance the prediction of defensive schemes in football by incorporating pre-snap player movements. Models augmented with HMM-derived features demonstrated improved predictive performance compared to those relying solely on naive motion data. The HMM effectively captured latent defensive behaviors, offering deeper insights into the defenders’ actions.

The HMM features allowed for more effective inference of guarding assignments, improving the distinction between man and zone coverage. This improvement underscores the HMM’s capacity to model the dynamic and often subtle nature of defensive strategies, capturing patterns that simple motion features cannot.

Looking ahead, the potential of HMMs goes beyond predicting man or zone defense. They could be applied to decode complex guarding assignments and analyze defensive adjustments in real-time, providing a powerful tool for deeper strategic insights in sports analytics.

Code

All code for data pre-processing, model training, prediction and player evaluation can be found here.

References

*Franks A, Miller A, Bornn L, Goldsberry K (2015). Characterizing the Spatial Structure of Defensive Skill in Professional Basketball. The Annals of Applied Statistics, 9(1), DOI:10.1214/14-AOAS799

*Koslik J (2024). LaMa: Fast Numerical Maximum Likelihood Estimation for Latent Markov Models. R package version 2.0.2, https://CRAN.R-project.org/package=LaMa.

*Zucchini W, MacDonald I, Langrock R (2016). Hidden Markov Models for Time Series - An Introduction Using R. CRC Press

Appendix

Feature engineering

Prior to more involved feature engeneering steps, we transform the coordinate system by redefining the x-variable as the x-distance to the endzone (such that all play directions are from right to left and the relevant endzone is at zero), and changing the direction variable, such that zero degrees represents heading straight towards the corresponding endzone. As mentioned in the main text, we use player features from 5 defensive and 5 offensive players. First, we standardized their \(x\)- and \(y\)-coordinates with respect to the football and ordered the players according to their \(y\)-coordinates, i.e., the the first defender in our dataset is always the leftmost defensive player, while the first offensive player is always the rightmost one (offensive play direction is from left to right). Furthermore, for each player we compute distances to the football and their orientation with respect to the quarterback.

Model features and comparison

As frequently mentioned, we face the problem of only having a small amount of relevant data available. In order to avoid overfitting problems, we therefore focus on a basic feature set of 32 variables for our main results: 6 convex hull related features (3 for offense and defense, respectively), 20 player positions features (10 standardized \(x\) and 10 standardized \(y\) coordinates, 5 for offense, 5 for defense), and 6 play-by-play features. However, using more thoroughly crafted features such as distances and orientation as described above, we can enlarge the feature set to 67 total variables: 30 distance variables (for each of the 10 relevant players the total distance, \(x\)-distances, and \(y\)-distances to the football), and 5 orientation variables (for each defender the orientation with respect to the QB).

In the following, we provide a comparison of these enlarged models and the smaller ones used for our main results. When comparing results of a binary outcome variable, accuracy can be a misleading metric, since it is dependent on the threshold selected for classification of the outcome. Usually, for probabilistic predictions as obtained by the two model classes considered, one uses the naive threshold of 0.5. However doing so is arbitrary and especially for imbalanced data, changing the threshold may change the results of the accuracy drastically. AUC (or area under the ROC curve) on the other hand, tries to evaluated the performance of a model over a suitable grid of thresholds, and thus avoids the subjective choice of threshold. Another popularly used metric is the logloss (or negative log-likelihood loss), which is often a preferred choice due to being a proper scoring rule. While using the logloss as evaluation metric is mathematically the best option, interpretation is not as intuitive. Specifically, whether a logloss of a model can be considered good, depends on the (im)balance of the classes (as well as the number of classes, which in a our binary case is only 2). Without going into further detail, we therefore focus on these two metrics for the evaluation of the models.

We see that using the more pronounced set of features (models contain “AF” in the name) seems to provide more information. At least the glmnet model, which is more simple but known to be able to handle a high number of features well, the versions including a bigger set of features (orientations and distances) performs better than the corresponding smaller models. For the xgboost model, the model including all features performs on par with the one including only small set of basic features. This suggests, that with more data the model may still be improved.

Finally, we mention that we also varied the number of players used in the analyses. Presently, we use only information of 5 offensive and 5 defensive players. However, using more information resulted in worse performance of the models, thus we refrain from showing the results here.

Hidden Markov model

A hidden Markov model consists of an observed time series \(\{y_t\}_{t=1}^T\) and an unobserved first-order Markov chain \(\{ g_t\}_{t=1}^T\), with \(g_t \in \{1,\ldots,N\}\). In this case, at every time point \(t\), \(y_t\) is the y-coordinate of the defensive player and \(g_t\) proxies the offensives player to be guarded, i.e. the guarding assignment. The Markov chain is fully described by an initial distribution \(\boldsymbol{\delta}^{(1)} = \bigl( \Pr(g_1=1), \ldots, \Pr(g_1=N) \bigr)\) and a transition probability matrix (t.p.m.) \(\boldsymbol{\Gamma} = (\gamma_{ij})\), with \(\gamma_{ij} = \Pr(g_t = j \mid g_{t-1} = i), \ i,j = 1, \ldots, N\). The connection of both stochastic processes arises from the assumption that the distribution of the observations \(y_t\) are fully determined by the state that is currently active. More formally, \[\begin{equation*} f(y_t \mid g_1, \ldots, g_T, y_1, \ldots, y_{t-1},y_{t+1},\ldots,y_T) = f(y_t \mid g_t = j), \qquad j \in \{1, \ldots, N\}, \end{equation*}\] which we denote by \(f_j(y_t)\) in short. In general, \(f_j\) can be any density or probability mass function depending on the type of data and a typical choice is a parametric distribution with separate parameters for each latent state. Following the approaches of Franks et al. (2015), we opt for a Gaussian distribution with a mean that is fully determined by the current \(y\)-coordinate of defender \(j \in \{1, \dots, N\}\) and a standard deviation that is fixed across all states but estimated from the data.

To fit the model, we use direct numerical likelihood maximization. The HMM likelihood for the motion of a specific defender in a specific play can be calculated based on the so-called forward algorithm. It effectively performs a summation over all possible latent state sequences in an efficient manner, rendering the computational complexity linear in the number of observations. Time series of different defenders within the same play and of different plays are treated as independent, hence their likelihood contributions are summed to obtain the full likelihood of the training data. For practical implementation, we wrote a custom likelihood function in R, using the function forward() and other convenience functions from the R package LaMa (Koslik, 2024) to speed up computations. Furthermore, we used the R package RTMB to make this likelihood function compatible with automatic differentiation, making the numerical optimization process more efficient and robust. The parameters to be estimated are only the transition probability matrix \(\boldsymbol{\Gamma}\) and the standard deviation of the Gaussian distribution. The initial distribution of the guarding assignment for each defender in each play arises from a deterministic assignment approach, based on spatial proximity at the beginning of the time-series, i.e., the moment the line is set. Specifically, for each defender in the predefined set of defenders considered, we compute a weighted Euclidean distance (giving more weight to the \(y\)-coordinate) to the five offensive players that could potentially be guarded. The initial distribution is then set to 1 for the closest defender under this metric and 0 otherwise. While this approach proved to be sufficiently reliable in this application, it may be finetuned in future iterations.

Having fitted an HMM to the data, we can use the model to predict the underlying state sequence based on the observations. This process is called state decoding and two main approaches exist. So-called local decoding constructs the conditional distributions \[ \Pr(g_t = j \mid y_1, \dots, y_T) \] while global decoding using the Viterbi-algorithm finds the state sequence that maximizes the joint probability of the state sequence given the observations. Local decoding retains more probabilistic information as it provides a categorical state distribution for each time point, while global decoding is more suitable for visualization purposes as it provides a single state sequence that is most likely to have generated the observations. To obtain both the local state probabilities and the global state sequence, we used the functions stateprobs() and viterbi() that are also contained in the R package LaMa.